Journal of Open Source Software
● The Open Journal
Preprints posted in the last 90 days, ranked by how well they match Journal of Open Source Software's content profile, based on 22 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Zhu, Y.; Lionts, M. M.; Haugen, E.; Walter, A. B.; Voss, T. R.; Grow, G. R.; Liao, R.; McKee, M. E.; Locke, A.; Hiremath, G.; Mahadevan-Jansen, A.; Huo, Y.
Show abstract
Raman spectroscopy offers a uniquely rich window into molecular structure and composition, making it a powerful tool across fields ranging from materials science to biology. However, the reproducibility of Raman data analysis remains a fundamental bottleneck. In practice, transforming raw spectra into meaningful results is far from standardized: workflows are often complex, fragmented, and implemented through highly customized, case-specific code. This challenge is compounded by the lack of unified open-source pipelines and the diversity of acquisition systems, each introducing its own file formats, calibration schemes, and correction requirements. Consequently, researchers must frequently rely on manual, ad hoc reconciliation of processing steps. To address this gap, we introduce TRaP (Toolbox for Reproducible Raman Processing), an open-source, GUI-based Python toolkit designed to bring reproducibility, transparency, and portability to Raman spectral analysis. TRaP unifies the entire preprocessing-to-analysis pipeline within a single, coherent framework that operates consistently across heterogeneous instrument platforms (e.g., Cart, Portable, Renishaw, and MANTIS). Central to its design is the concept of fully shareable, declarative workflows: users can encode complete processing pipelines into a single configuration file (e.g., JSON), enabling others to reproduce results instantly without reimplementing code or reverse-engineering undocumented steps. Beyond convenience, TRaP integrates configuration management, X-axis calibration, spectral response correction, interactive processing, and batch execution into a workflow-driven architecture that enforces deterministic, repeatable operations. Every transformation is explicitly recorded, making the full processing history transparent, inspectable, and reproducible. This eliminates ambiguity in how results are generated and ensures that identical protocols can be applied consistently across datasets and experimental contexts. Through representative use cases, we show that TRaP enables seamless, reproducible preprocessing of Raman spectra acquired from diverse platforms within a unified environment. We hope TRaP can empower Raman data processing as a reproducible, shareable, and systematized scientific practice, aligning it with modern standards for computational research. TRaP is released as an open-source software at https://github.com/hrlblab/TRaP
Freese, N. H.; Raveendran, K.; Sirigineedi, J. S.; Chinta, U. L.; Badzuh, P.; Marne, O.; Shetty, C.; Naylor, I.; Jagarapu, S.; Loraine, A.
Show abstract
SummaryTrack Hub Quickload Translator is a web application that interconverts University of California Santa Cruz (UCSC) Genome Browser track hub and Integrated Genome Browser (IGB) data repository formats by translating the track hub or Quickload configuration files to the other genome browsers required format. This new work enables researchers to work with tens of thousands of published genome assemblies for the first time using either browser. Availability and ImplementationTrack Hub Quickload Translator is implemented using Python 3 and freely available to use at translate.bioviz.org. Integrated Genome Browser is available from BioViz.org. Track Hub Quickload Translator, GenArk Genomes, and the Integrated Genome Browser source code is available from github.org/lorainelab. Contactaloraine@charlotte.edu
Turkington, C.; Bastiaanssen, F.; Nezam-Abadi, N.; Shkoporov, A. N.; Hill, C.
Show abstract
Bacterial taxonomic type strains anchor species names to physical and genomic reference material, making them essential for reproducible and comparable prokaryotic research. While reference strains are often well-characterised through curated metadata, nomenclature histories, and sequence records, no single database holds up-to-date information on all these aspects, resulting in fragmented information. Gathering the complete set of information for a type strain is further complicated by inconsistencies in nomenclature between sources due to the often-numerous synonyms that can describe a single strain. As a result, collecting type strain data for taxonomic proposals and emendations can be an onerous task requiring extensive manual curation. To address this issue, we introduce Ratatoskr, a Python-based tool that automates the retrieval of sequences and metadata for bacterial taxonomic type strains. Ratatoskr facilitates this by collecting the latest type strain information of the List of Prokaryotic names with Standing in Nomenclature (LPSN) and using this information to query the BacDive and NCBI databases. By applying known taxonomic synonym information Ratatoskr is able to resolve cross-database inconsistencies and streamline the retrieval process. We show that through its use, Ratatoskr can obtain metadata and sequence data for type strains of bacteria within minutes to seconds, depending on the number of members within the requested taxon. By automating this retrieval, Ratatoskr provides fast, accurate, and readily shareable starting points for studies involving the use of taxonomic type strains and data, such as new taxonomic proposals or emendations. Data summaryRatatoskr was developed using Python 3 and is freely available at https://github.com/Fabian-Bastiaanssen/Ratatoskr under a GPL-3.0 licence.
Lee, A. J.; Sanin, D. E.
Show abstract
IntroductionCommon spatial transcriptomic analysis pipelines in R focus on pre-processing and visualization, while providing limited and indirect methods to leverage true spatially resolved quantification of transcripts. Often, x,y-coordinates in spatial transcriptomics (ST) data are integrated into analysis via "spatially aware" normalization (Salim et al., 2024), clustering methods (Zhao et al., 2021), or the identification of spatially variable genes (Yan et al., 2025). Though useful, these methods do not provide any opportunity for analysts to adjust or interrogate the underlying graphs that define adjacencies between spots in their data. Here, we present SpotGraphs, a package that allows the user a more direct and flexible option to interact with the x,y-coordinates of their ST data in R through the existing igraph infrastructure (Antonov et al., 2023; Csardi et al., 2025; Csardi & Nepusz, 2006). Similar functionality exists in Python through SquidPys graph API (Palla et al., 2022), and we compare results obtained from both packages, demonstrating similar performance. Additionally, we provide a set of tools that are useful for ST data analysis, including a toolkit to filter low quality spots laying on tissue debris, beyond arbitrary thresholds, edit spot-level adjacencies based on spatial clusters, and identify centers or boundaries of user-defined neighborhoods of interest.
Deolankar, S.; Wermeling, F.
Show abstract
CRISPR screen data provides a valuable resource for understanding gene function and identifying potential drug targets. Here, we present Correlate, a freely accessible web application (https://correlate.cmm.se) that enables exploration of the Cancer Dependency Map (DepMap) CRISPR screen gene effects, hotspot mutations, and translocation/fusion data across more than 1,000 human cancer cell lines. The application supports two main use cases: (i) analysis of user-defined gene sets (e.g. CRISPR screen hits) to identify functionally linked genes based on correlations while providing an overview based on essentiality or user-provided screen statistics; and (ii) exploration of genes of interest in defined biological contexts, such as specific cancer types or mutational backgrounds, to generate hypotheses about gene function and dependencies. Additionally, Correlate supports experimental design by providing rapid overviews of gene essentiality and enabling the identification of cell lines with relevant mutational profiles. In contrast to knowledge-based approaches such as STRING and GSEA, which rely on prior biological annotations and curated interaction networks, Correlate identifies gene connections directly from functional CRISPR screen readouts, offering a complementary and data-driven perspective on gene network analysis. The application runs entirely in the browser, requires no installation or login, and integrates with the Green Listed v2.0 tool family for custom CRISPR screen design. HIGHLIGHTS{blacksquare} Interactive web-based platform for bulk correlation analysis of user-defined gene sets using DepMap CRISPR screen data, requiring no installation or programming expertise. {blacksquare}Identifies functional gene relationships from CRISPR screen readouts rather than curated annotations, offering a data-driven complement to tools such as GSEA and STRING. {blacksquare}Enables contextual exploration of gene dependencies across cancer types and mutational backgrounds, supporting hypothesis generation about gene function and therapeutic targets. {blacksquare}Supports experimental design through gene essentiality overviews, mutation and fusion analysis, and cell line identification, with optional integration of user-provided statistics from CRISPR screens, proteomics, or transcriptomics analyses.
Sosa, S.; Brooke McElreath, M.; Ross, C.
Show abstract
O_LIBayesian modeling is a powerful paradigm in modern statistics and machine learning. However, practitioners face significant obstacles in building bespoke models. C_LIO_LIThe landscape of Bayesian software is fragmented across programming languages and abstraction levels. Newcomers often gravitate towards high-level interfaces, like R, in order to use simple generalized linear models (GLMs) through interfaces like brms. C_LIO_LIFor niche problems, researchers must often transition to writing directly in lower-level programming languages, like Stan or JAX, which require specialist knowledge. C_LIO_LIFurthermore, computational demands remain a significant bottleneck, often limiting the feasibility of applying Bayesian methods on large datasets and complex, high-dimensional models. C_LIO_LIThe Bayesian Inference (BI) is a cross-platform software distributed as a Python, R and Julia library. It provides an intuitive model-building syntax with the flexibility of low-level abstraction coding, while also providing pre-built GLM functions. Further, by facilitating hardware-accelerated GPU computation under-the-hood, BI permits high-dimensional models to be fit in a fraction of the time of comparable Stan models (up to 200-fold). C_LI
Schuster, J.; Zeglinski, K.; Xiao, L. C.; Voulgaris, O.; Rivera, S. M.; Vervoort, S. J.; Ritchie, M. E.; Gouil, Q.; Clark, M. B.
Show abstract
The wide variety of protocols and applications for DNA and RNA sequencing makes flexible tools for read processing an important step in sequence analysis. Beyond trimming and demultiplexing, custom read-level processing is commonly required for data exploration, QC and analysis. Existing tools are often task-specific and dont generalise to new bioinformatic problems. Thus, there is a need for a tool flexible enough to handle the full variety of read processing tasks, and fast and scalable enough to retain high performance on growing sequencing datasets. We introduce matchbox, a read processor that enables fluent manipulation and analysis of FASTA/FASTQ/SAM/BAM files. With a lightweight scripting language designed around error-tolerant pattern-matching, users can write their own matchbox scripts to tackle a wide variety of bioinformatic problems, and incorporate them into existing pipelines and work-flows. We demonstrate matchboxs versatility in a number of contexts: demultiplexing long-read scRNA-seq data with 10X or SPLiT-seq barcodes; restranding RNA-seq reads; assessing CRISPR editing efficiency; and haplotyping macrosatel-lite repeat regions. matchbox achieves a computational performance comparable to existing tools, while addressing a broader range of bioinformatic needs, representing a new state-of-the-art in sequence processing. matchbox is implemented in Rust and available open-source at https://github.com/jakob-schuster/matchbox.
Schilder, B. M.; Skene, N. G.; Murphy, A. E.
Show abstract
MotivationMapping genes across identifier systems and species is a routine but critical step in bioinformatics workflows. Despite its ubiquity, gene mapping is frequently handled using bespoke, ad hoc solutions, increasing duplicated effort and introducing opportunities for error. These issues are exacerbated by the prevalence of non-one-to-one homolog relationships and inconsistent handling of gene identifiers across species and databases, which can compromise downstream analyses and reproducibility. ResultsWe present orthogene, an R/Bioconductor package that simplifies gene mapping within and across hundreds of species. orthogene provides a unified, workflow-oriented framework that integrates automated species and identifier standardization, homolog inference across multiple databases, flexible handling of ambiguous homolog relationships, and transformation of gene lists, tables, and high-dimensional matrices into analysis-ready formats. By abstracting common sources of technical complexity while retaining user control, orthogene enables transparent, reproducible, and scalable gene mapping across a wide range of biological contexts. Availabilityhttps://bioconductor.org/packages/orthogene Contactbrian_schilder@alumni.brown.edu
d'Oelsnitz, S.; Zhao, N. N.; Talla, P.; Jeong, J.; Love, J. D.; Springer, M.; Silver, P. A.
Show abstract
Prokaryotic transcription factors (TFs) are used as small molecule biosensors with broad applications in biotechnology, yet only a small fraction from microbial genomes have been characterized. To address this gap, we recently described the bioinformatic method Ligify, which leverages information from genome context and enzyme reaction databases to predict a TFs cognate effector molecule. Here we report Ligify 2.0, a modern web server for Ligify predictions. We systematically evaluate 10,965 small molecules within the Rhea enzyme reaction database for associations to TFs, ultimately generating 13,435 hypothetical interactions between 1,362 small molecules and 3,164 TFs. We then develop an interactive web server (https://ligify.groov.bio) to search and visualize prediction data. Each TF sensor page includes visualizations for chemical ligand structures, interactive TF protein structures, and genome context. Pages also include metadata links, predicted promoter sequences, prediction confidence metrics, and references to relevant literature. A plasmid builder tool enables users to generate custom biosensor circuit designs. Finally, we provide case studies using Ligify 2.0 to identify two TFs from the pathogens Escherichia coli O157:H7 and Mycobacterium abscessus responsive to 4-hydroxybenzoate and Pseudomonas Quinolone Signal, respectively. The Ligify web server aims to facilitate the systematic characterization of biosensors for chemical-control of biological systems. Key pointsO_LILigify 2.0 contains >13,000 predicted transcription factor-small molecule interactions C_LIO_LIA rich web interface provides interactive visualizations and a plasmid design tool C_LIO_LIPredicted ligands for regulators from pathogenic bacteria are experimentally validated C_LI Graphic abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=70 SRC="FIGDIR/small/683484v2_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@1afa575org.highwire.dtl.DTLVardef@97c811org.highwire.dtl.DTLVardef@cfdb93org.highwire.dtl.DTLVardef@58977d_HPS_FORMAT_FIGEXP M_FIG C_FIG
Gohl, P.; Fornes, O.; Bota, P. M.; Messeguer, A.; Bonet, J.; Molina-Fernandez, R.; Planas-Iglesias, J.; Hernandez, A. C.; Gallego, O.; Fernandez-Fuentes, N.; Oliva, B.
Show abstract
SummaryThe ModCRElib package provides various tools for the analysis and modelling of transcription factor(TF)-DNA and regulatory complex inter-protein interactions. It takes structural information on these interactions to predict TF binding motifs, generate binding profiles along DNA sequences that score the binding affinity, predict TF binding sites and model the structure of higher order regulatory complexes. It is capable of working with a variety of input data formats and sources. The user may follow the analysis pipeline as outlined in the documentation or the user can make use of any of the multiple functionalities in an isolated manner. The package takes the service offered by the ModCRE server and enables users to apply its tools in an unrestricted and customizable manner. In this paper we provide 5 example uses of ModCRElib. This includes (i) TF binding affinity prediction, (ii) TF binding aggregation, (iii) characterization of specificity in TF binding sites along target DNA sequences,(iv) modelling TF bound to predicted binding sites, and (v) the generation of statistical potential derived scoring profiles of TF interacting with DNA. Availabilityhttps://github.com/structuralbioinformatics/ModCRElib Contactbaldo.oliva@upf.edu Supplementary informationAvailable at https://github.com/structuralbioinformatics/ModCRElib. doi:10.5281/zenodo.17484081
Jang, L. S.-e.; Cha, S.; Steinegger, M.
Show abstract
Terminal-based workflows are central to large-scale structural biology, particularly in high-performance computing (HPC) environments and SSH sessions. Yet no existing tool enables real-time, interactive visualization of protein backbone structures directly within a text-only terminal. To address this gap, we present StrucTTY, a fully interactive, terminal-native protein structure viewer. StrucTTY is a single self-contained executable that loads mulitple PDB and mmCIF files, normalizes three-dimensional coordinates, and renders protein structures as ASCII graphics. Users can rotate, translate, and zoom in on structures, adjust visualization modes, inspect chain-level features and view secondary structure assignments. The tool supports simultaneous visualization of up to nine protein structures and can directly display structural alignments using Foldseeks output, enabling rapid comparative analysis in headless environments. The source code is available at https://github.com/steineggerlab/StrucTTY. O_TEXTBOXKey MessagesO_LIReal-time, interactive protein structure visualization directly within text-only terminals C_LIO_LIASCII-based, depth-aware rendering of PDB and mmCIF backbone structures C_LIO_LIMulti-structure comparison with direct application of Foldseek alignment transformations C_LIO_LIDesigned for headless workflows on remote servers and HPC systems C_LI C_TEXTBOX
Shirali, H.; Boese, N. E.; Kramer, M.; Wang, J.; Klug, N.; Mikut, R.; Meier, R.; Pylatiuk, C.; Wuehrl, L.
Show abstract
The rapid decline of global biodiversity necessitates scalable and accessible tools for monitoring insect populations, yet the high cost and slow pace of specimen digitization remain significant bottlenecks. To address this challenge, we present the Entomoscope 2.0, an open-source platform that integrates a low-cost photomicroscope with an AI-integrated software suite, ENIMAS 2.0. The system combines optimized hardware with an end-to-end software module for digital specimen curation. This module comprises automated specimen cropping, background standardization, morphometric analysis using an Oriented Bounding Box (OBB) with a human-in-the-loop (HITL) supervision, and a flexible interface for rapid taxonomic screening using custom AI identification models. With a material cost of only 400 {euro}, the system offers a cost-effective alternative to expensive commercial solutions. We compare results from the Entomoscope 2.0 with a high-end commercial system (Keyence) using 54 insect specimens and demonstrate the efficiency of the proposed AI workflow. Entomoscope 2.0 completed the whole digitization process in an average of 54.6 seconds per specimen, representing a 2.28-fold increase in speed over the Keyence systems multi-step workflow (124.3 seconds). Crucially, all hardware specifications and construction manuals are freely available to support widespread adoption. By lowering financial barriers and accelerating research workflows, the Entomoscope 2.0 platform offers a practical solution to enable high-throughput digitization for researchers, educators, and citizen scientists.
Fan, B.; Bilodeau, A.; Beaupre, F.; Wiesner, T.; Gagne, C.; Lavoie-Cardinal, F.; Hlozek, R.
Show abstract
SignificanceFluorescence-based Ca2+-imaging is a powerful tool for studying localized neuronal activity, including miniature Synaptic Calcium Transients, providing real-time insights into synaptic activity. These transients induce only subtle changes in the fluorescence signal, often barely above baseline, which poses a significant challenge for automated synaptic transient detection and segmentation. AimDetecting astronomical transients similarly requires efficient algorithms that will remain robust over a large field of view with varying noise properties. We leverage techniques used in astronomical transient detection for miniature Synaptic Calcium Transient detection in fluorescence microscopy. ApproachWe present Astro-BEATS, an automatic miniature Synaptic Calcium Transient segmentation algorithm that incorporates image estimation and source-finding techniques used in astronomy and designed for Ca2+-imaging videos. Astro-BEATS uses the Rolling Hough Transform filament detector to construct an estimate of the expected (transient-free) fluorescence signal of both the dendritic foreground and the background. Subtracting this baseline signal yields difference images displaying transient signals. We use Density-Based Spatial Clustering of Applications with Noise to find sources clustered in spatial and temporal space. ResultsAstro-BEATS outperforms current threshold-based approaches for synaptic Ca2+ transient detection and segmentation. The produced segmentation masks can be used to train a supervised deep learning algorithm for improved synaptic Ca2+ transient detection in Ca2+-imaging data. The speed of Astro-BEATS and its applicability to previously unseen datasets without re-optimization makes it particularly useful for generating training datasets for deep learning-based approaches. ConclusionAstro-BEATS greatly reduces the time needed for the annotation of synaptic Ca2+ transient and removes the significant overhead of human expert annotation, enabling consistent analysis of new Ca2+-imaging datasets.
Patel, H.; Crosslin, D.; Jarvik, G. P.; Hall, T.; Veenstra, D.; Xie, S.
Show abstract
The lack of user-centered design principles in the current landscape of commonly-used bioinformatics software tools poses challenges for novice genomics researchers (NGRs) entering the genomics ecosystem. Comparing the usability of one analysis software to that of another is a non-trivial task and requires evaluation criteria that incorporates perspectives from both existing literature and a diverse, underrepresented user base of NGRs. To better characterize these barriers, we utilized a two-pronged approach consisting of a literature review of existing bioinformatics tools and semi-structured interviews of the needs of NGRs. From both knowledge sources, the key attributes that resulted in poor adoption and sustained use of most bioinformatics tools included poor documentation, lack of readily-accessible informational content, challenges with installation and dependency coordination, and inconsistent error messages/progress indicators. Combining the findings from the literature review and the insights gained by interviewing the NGRs, an evaluation rubric was created that can be utilized to grade existing and future bioinformatics tools. This rubric acts as a summary of key components needed for software tools to cater to the diverse needs of both NGRs and experienced users. Due to the rapidly evolving nature of genomics research, it becomes increasingly important to critically evaluate existing tools and develop new ones that will help build a strong foundation for future exploration.
Lutfi, A.; Chen, Z. A.; Fischer, L.; Rappsilber, J.
Show abstract
Mass spectrometry (MS) experiments generate rich acquisition metadata that are essential for reproducibility, data sharing, and quality control (QC). Because these metadata are typically stored only in vendor-specific formats, they often remain difficult to access. MetaXtract is a lightweight tool that extracts detailed parameters directly from Thermo Fisher raw files and exposes them in structured, tabular formats. By capturing sample information, LC-MS method settings, and scan-level metrics such as retention time, total ion current, and ion injection time, MetaXtract increases transparency and ensures that essential acquisition details accompany published data and results in easy readable form. This supports FAIR data practices by improving the findability, accessibility, interoperability, and reusability of MS datasets after converting them to other formats, thereby increasing the value of deposition in public repositories. The importance of such metadata accessibility was recently highlighted by the crosslinking mass spectrometry community in efforts to advance FAIR data principles, and it extends to MS-based omics approaches more broadly. Importantly, MetaXtract enables search-free, near real-time performance monitoring by relying on acquisition-side signals, providing actionable indicators immediately after data acquisition rather than after database searching. This also caters for laboratory or depository internal streamlined QC and troubleshooting through integration into automated pipelines. By embedding acquisition parameters into routine data handling, MetaXtract strengthens reproducibility, optimises method development, and supports large-scale applications, including machine learning and secondary data analysis. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=195 HEIGHT=200 SRC="FIGDIR/small/687968v2_ufig1.gif" ALT="Figure 1"> View larger version (23K): org.highwire.dtl.DTLVardef@d835e6org.highwire.dtl.DTLVardef@babfaforg.highwire.dtl.DTLVardef@7e9d69org.highwire.dtl.DTLVardef@907993_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIMetadata extraction from Thermo Fisher raw files C_LIO_LIEnhanced findability, accessibility, interoperability, and reusability of deposited data C_LIO_LIIntegration into workflows via GUI and command-line modes C_LIO_LITroubleshooting support by visualizing MS1/MS2 scan details C_LIO_LIIndexed MS1/MS2 peak list export enabling machine learning workflows C_LI AvailabilityMetaXtract is available for free download as open-source software at https://github.com/Rappsilber-Laboratory/MetaXtract, the software is licensed under the Apache-2.0 license.
Roldan, A.; Duran, T. G.; Far, A. J.; Capa, M.; Arboleda, E.; Cancellario, T.
Show abstract
The era of Big Data has reshaped biodiversity research, yet the potential of this information is frequently constrained by data heterogeneity, incompatible schemas, and the fragmentation of resources. Whilst standards such as Darwin Core have improved interoperability, significant barriers persist in harmonising multi-typology datasets ranging from taxonomy and genetics to species distribution. Here, we present the Biodiversity Observatory System (BiOS), a comprehensive, open-source software stack designed to address these impediments through a modular, community-driven architecture. BiOS departs from monolithic database designs by decoupling the back-end data management from the front-end presentation layer. This architectural separation supports a dual-access model tailored to diverse stakeholder needs. For researchers and developers, the system offers a comprehensive Application Programming Interface (API) that exposes all back-end functionalities, enabling seamless programmatic access, automated data retrieval, and integration with external analytical workflows. Simultaneously, the platform features a user web interface designed to lower the technical barrier to entry. This interface facilitates intuitive data exploration through agile taxonomic navigation, advanced geospatial map viewers for species occurrence filtering, and dedicated dashboards for visualising genetic markers and legislative status. Strictly adhering to the FAIR principles (Findable, Accessible, Interoperable, Reusable), BiOS acts as a relational engine capable of integrating heterogeneous data streams. By providing a flexible, interoperable core that supports the "seven shortfalls" framework of biodiversity knowledge, BiOS offers a turnkey solution to overcome data fragmentation and enhance collaborative conservation efforts.
Kim, M. E.; Rudravaram, G.; Saunders, A.; Gao, C.; Ramadass, K.; Newlin, N. R.; Kanakaraj, P.; Bogdanov, S.; Archer, D.; Hohman, T. J.; Jefferson, A. L.; Morgan, V. L.; Roche, A.; Englot, D. J.; Resnick, S. M.; Beason-Held, L. L.; Bilgel, M.; Cutting, L.; Barquero, L. A.; D'arcangel, M. A.; Nguyen, T. Q.; Humphreys, K. L.; Niu, Y.; Vinci-Booher, S.; Cascio, C. J.; Pechman, K. R.; Shashikumar, N.; The HABS-HD Study Team, ; Alzheimers Disease Neuroimaging Initiative, ; The BIOCARD Study Team, ; Li, Z.; Vandekar, S. N.; Zhang, P.; Gore, J. C.; Liu, Y.; Zuo, L.; Schilling, K. G.; Moyer, D. C.;
Show abstract
Brain charts, or normative models of quantitative neuroimaging measures, can identify trajectories of brain development and abnormalities in groups and individuals by leveraging large populations. Recent work has extended these brain charts to model microstructural and macrostructural features of white matter. Assessments of variance for these brain charts are necessary to determine whether the models being used for these data are stable. We implement an analytic approach to characterize variability of the parameters in previously released brain charts created using the generalized additive models for location, scale, and shape (GAMLSS) framework. Additionally, we empirically validate the accuracy of each analytic model through a comparison to a bootstrapping approach from 0.2 to 90 years of age. We find that across all models, the analytic coefficient of variation (COV) remains below 5% for ages greater than 0.25 years, with the maximum empirical observed COV reaching 7% at 0.2 years of age. Further, the empirical assessment shows high agreement with the analytic assessment, with COV estimates averaged across the lifespan for all models having a Pearson correlation coefficient of 0.776 and a mean difference of 4 x 10-4. Both methods exhibit volume and surface area as the features with the largest average COV for the majority of tracts. However, the analytic assessment yields axial diffusivity as the feature most frequently having the smallest COV, whereas the corresponding feature for the empirical assessment is average length. These results suggest that the analytic approach overestimates model stability for WM brain charts when the COV is low and that the validation method is suitable for assessing whether GAMLSS models are unstable.
Riendeau, J. M.; Hockerman, L.; Maly, E.; Samimi, K. M.; Skala, M. C.
Show abstract
SignificanceStandard methods to characterize peripheral blood mononuclear cells (PBMCs) are often destructive, lack metabolic information, or do not provide single-cell resolution. Label-free tools that non-destructively measure single-cell metabolism within PBMCs can provide new layers of information to characterize disease state and cell therapy potential. AimDetermine whether non-destructive fluorescence lifetime imaging microscopy (FLIM) of endogenous metabolic co-factors NAD(P)H and FAD, or optical metabolic imaging (OMI), can identify immune cell subsets and activation state within heterogeneous PBMC cultures. ApproachOMI measured single-cell metabolism of PBMCs from 3 different human donors in the quiescent or activated (phorbol 12-myristate 13-acetate and ionomycin) state. Fluorescent antibodies were used as ground truth labels for single-cell classifiers of immune cell subtypes. ResultsOMI identified quiescent vs. activated PBMCs with 93% accuracy at only 2 hours post-stimulation, identified monocytes within quiescent and activated PBMCs with 96% and 88% accuracy, respectively, and identified NK cells within quiescent and activated PBMCs with 74% accuracy. ConclusionOMI identifies activation state and immune cell subpopulations within PBMCs, enabling single-cell and label-free measurements of metabolic heterogeneity within complex PBMC samples. Therefore, OMI could enhance PBMC immunophenotyping for diagnostic and therapeutic applications. Statement of DiscoveryWe demonstrate that autofluorescence lifetime imaging can resolve functional and phenotypic metabolic subpopulations within a mixed culture of immune cells from human blood. This provides a new technique to characterize metabolic activity within immune cells from the peripheral blood of patients, which could improve disease diagnostics and the production of cell therapies.
Martelli, E.; Ratto, M. L.; Nuvolari, B.; Arigoni, M.; Tao, J.; Micocci, F. M. A.; Alessandri, L.
Show abstract
BackgroundAchieving FAIR-compliant computational research in bioinformatics is systematically undermined by two compounding challenges that existing tools leave unresolved: long-term reproducibility and accessibility. Standard package managers re-download dependencies from live repositories at every build, making environments vulnerable to library disappearance and version drift, and pinning a package version does not pin the versions of its transitive dependencies, causing divergences between builds performed at different points in time. Compounding this, packages from repositories such as CRAN, Bioconductor, and PyPI frequently omit critical system-level dependencies from their installation metadata, leaving users to manually discover which underlying library is missing or which version is required. Beyond these technical failures, constructing a truly reproducible environment demands expertise in containerization making reproducibility in practice a privilege and not a standard. FindingsWe present REBEL (Reproducible Environment Builder for Explicit Library Resolution), a framework that addresses both challenges through three dependency inference heuristics: (i) Deep Inspection of source code, (ii) Fuzzy Matching against a manually curated knowledge base, and (iii) Conservative Dependency Locking. The resolved dependency stack is then archived into a self-contained local store, enabling offline and deterministic rebuilds at any future time. We compared the installation of 1,000 randomly sampled CRAN packages in isolated Docker containers versus the standard package manager and REBEL resolved 149 of 328 standard installation failures (45.4%). Moreover through its DockerBuilder component, REBEL further generates fully reproducible Docker images from a plain text requirements file, making deterministic environment construction accessible without expertise in containerization. ConclusionsREBEL provides a practical foundation for FAIR-compliant, long-term reproducible bioinformatics analyses, making deterministic environment construction accessible to researchers regardless of their technical background. REBEL is freely available at https://github.com/Rebel-Project-Core
Anderson, J. K.; Zhang, J.; Ge, X.; Fan, H.; Leng, Y.; Silverstein, M.; Conrad, R.; Li, Z.; Holmes, E.; Joseph, S. S.; Lu, S.; Shinohara, R.; Li, T.; Johnson, W. E.; Alzheimers Disease Neuroimaging Initiative,
Show abstract
Batch effect correction is a common and often necessary step in data analysis to reduce bias due to technical and experimental factors when combining multiple batches of data. The severity of the batch effects dictates the correction strategy; therefore, a careful assessment of each datasets batch effects is necessary. BatchQC is an R package that provides reproducible tools and visualizations for quantitatively and qualitatively addressing batch effects across a broad range of data types. BatchQC integrates with standardized Bioconductor data structures and features an object-oriented design, enabling the application of workflows that can freely evaluate and process data within and outside the package tools. Common batch evaluation methods, along with novel quantitative metrics, help determine the benefits of batch correction for each dataset and enable direct comparisons between methods. Here, we present BatchQC as the first comprehensive batch-correction R package, with independent tools, reproducible workflows, visualization, and novel statistics.